logo of company

Bioinformatics pipeline summary


Where we see the pipeline processes

Author: Adrien Taudière

Date: October 19, 2024

Summary of the bioinformatic pipeline

Code
library(knitr)
library(targets)
library(MiscMetabar)
Code
tar_glimpse(script=here::here("_targets.R"), targets_only = TRUE, callr_arguments = list(show = FALSE))

Load phyloseq object from targets store

Code
#d_pq <- tar_read(script=here::here("_targets.R"))
d_pq <- data_fungi

The {targets} package is at the core of this project. Please read the intro of the user manual if you don’t know {targets}.

The {targets} package store … targets in a folder and can load (tar_load()) and read (tar_read) object from this folder.

Sample data

Code
DT::datatable(d_pq@sam_data)

Sequences, samples and clusters across the pipeline

Code
formattable_pq(
    d_pq,
    "Height",
    min_nb_seq_taxa = 10000,
    formattable_args = list("Phylum" = FALSE),
    log10trans = TRUE
  )
54 samples were discarded due to NA in variable modality
Cleaning suppress 0 taxa (  ) and 22 sample(s) ( BE9-006-B_S27_MERGED.fastq.gz / BG7-010-H_S31_MERGED.fastq.gz / C21-NV1-M_S64_MERGED.fastq.gz / D9-027-B_S83_MERGED.fastq.gz / DJ2-008-B_S87_MERGED.fastq.gz / DJ2-008-H_S88_MERGED.fastq.gz / DY5-004-H_S97_MERGED.fastq.gz / DY5-004-M_S98_MERGED.fastq.gz / E9-009-B_S100_MERGED.fastq.gz / E9-009-H_S101_MERGED.fastq.gz / J18-004-B_S114_MERGED.fastq.gz / J18-004-H_S115_MERGED.fastq.gz / J18-004-M_S116_MERGED.fastq.gz / N22-001-B_S129_MERGED.fastq.gz / O20-X-B_S139_MERGED.fastq.gz / O21-007-M_S144_MERGED.fastq.gz / R28-008-H_S159_MERGED.fastq.gz / R28-008-M_S160_MERGED.fastq.gz / W26-001-H_S166_MERGED.fastq.gz / W26-001-M_S167_MERGED.fastq.gz / Y29-007-H_S182_MERGED.fastq.gz / Y29-007-M_S183_MERGED.fastq.gz ).
Number of non-matching ASV 0
Number of matching ASV 1420
Number of filtered-out ASV 1400
Number of kept ASV 20
Number of kept samples 109
Cleaning suppress 0 taxa and 0 samples.
Joining with `by = join_by(OTU)`
OTU Order Family Genus High Low Middle proportion_samp nb_seq
ASV2 Xylariales Diatrypaceae Eutypa 4.43 4.78 3.65 0.74 4.97
ASV6 Xylariales Xylariaceae Xylaria 4.51 4.00 3.69 0.14 4.73
ASV7 Russulales Stereaceae NA 4.20 3.58 3.81 0.24 4.68
ASV8 Russulales Stereaceae Stereum 4.16 3.52 4.08 0.50 4.67
ASV10 NA NA NA 4.43 3.07 3.73 0.26 4.61
ASV12 Hymenochaetales Schizoporaceae Xylodon 3.26 3.91 3.13 0.18 4.58
ASV13 NA NA NA 3.77 4.22 2.96 0.45 4.52
ASV18 Russulales Stereaceae Stereum 4.15 3.04 2.74 0.18 4.44
ASV19 Saccharomycetales Debaryomycetaceae Scheffersomyces 3.22 4.30 3.39 0.14 4.43
ASV22 Xylariales Xylariaceae Xylaria 4.31 0.00 3.51 0.02 4.37
ASV23 Xylariales Xylariaceae Daldinia 4.35 0.00 0.00 0.02 4.35
ASV26 Russulales Stereaceae Stereum 2.09 3.06 4.15 0.07 4.32
ASV27 Polyporales Steccherinaceae Antrodiella 3.23 4.28 0.90 0.02 4.31
ASV28 Pertusariales Pertusariaceae Pertusaria 4.06 3.01 3.69 0.22 4.30
ASV35 Polyporales Polyporaceae Fomes 3.70 2.94 3.97 0.04 4.18
ASV41 Agaricales Tricholomataceae Mycena 0.00 1.08 4.02 0.05 4.11
ASV43 Pleosporales Amorosiaceae Angustimassarina 1.65 4.04 3.00 0.16 4.08
ASV45 Helotiales Helotiaceae Scytalidium 4.03 0.48 1.62 0.05 4.07
ASV46 Atractiellales Atractiellales_fam_Incertae_sedis Helicogloea 3.09 3.91 3.05 0.04 4.04
ASV53 Polyporales Polyporaceae Fomes 3.61 1.91 3.77 0.03 4.00

Session Information

Session information are detailed below. More information about the machine, the system, as well as python and R packages, are available in the file data_final/information_run.txt .

Code
sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Debian GNU/Linux 12 (bookworm)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.11.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0

locale:
 [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8    
 [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
 [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Paris
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] MiscMetabar_0.9.3 purrr_1.0.2       dplyr_1.1.4       dada2_1.32.0     
[5] Rcpp_1.0.13       ggplot2_3.5.1     phyloseq_1.48.0   targets_1.8.0    
[9] knitr_1.48       

loaded via a namespace (and not attached):
  [1] bitops_1.0-9                deldir_2.0-4               
  [3] gridExtra_2.3               permute_0.9-7              
  [5] rlang_1.1.4                 magrittr_2.0.3             
  [7] ade4_1.7-22                 matrixStats_1.4.1          
  [9] compiler_4.4.1              mgcv_1.9-1                 
 [11] png_0.1-8                   callr_3.7.6                
 [13] vctrs_0.6.5                 reshape2_1.4.4             
 [15] stringr_1.5.1               pwalign_1.0.0              
 [17] pkgconfig_2.0.3             crayon_1.5.3               
 [19] fastmap_1.2.0               backports_1.5.0            
 [21] XVector_0.44.0              utf8_1.2.4                 
 [23] Rsamtools_2.20.0            rmarkdown_2.28             
 [25] UCSC.utils_1.0.0            ps_1.8.0                   
 [27] xfun_0.48                   cachem_1.1.0               
 [29] zlibbioc_1.50.0             GenomeInfoDb_1.40.1        
 [31] jsonlite_1.8.9              biomformat_1.32.0          
 [33] rhdf5filters_1.16.0         DelayedArray_0.30.1        
 [35] Rhdf5lib_1.26.0             BiocParallel_1.38.0        
 [37] jpeg_0.1-10                 parallel_4.4.1             
 [39] cluster_2.1.6               R6_2.5.1                   
 [41] bslib_0.8.0                 RColorBrewer_1.1-3         
 [43] stringi_1.8.4               jquerylib_0.1.4            
 [45] GenomicRanges_1.56.1        SummarizedExperiment_1.34.0
 [47] iterators_1.0.14            IRanges_2.38.1             
 [49] Matrix_1.7-0                splines_4.4.1              
 [51] igraph_2.0.3                tidyselect_1.2.1           
 [53] viridis_0.6.5               rstudioapi_0.16.0          
 [55] abind_1.4-8                 yaml_2.3.10                
 [57] vegan_2.6-8                 codetools_0.2-20           
 [59] hwriter_1.3.2.1             processx_3.8.4             
 [61] lattice_0.22-6              tibble_3.2.1               
 [63] plyr_1.8.9                  Biobase_2.64.0             
 [65] withr_3.0.1                 ShortRead_1.62.0           
 [67] evaluate_1.0.0              survival_3.7-0             
 [69] RcppParallel_5.1.9          formattable_0.2.1          
 [71] Biostrings_2.72.1           pillar_1.9.0               
 [73] BiocManager_1.30.25         MatrixGenerics_1.16.0      
 [75] DT_0.33                     renv_1.0.9                 
 [77] foreach_1.5.2               stats4_4.4.1               
 [79] generics_0.1.3              rprojroot_2.0.4            
 [81] S4Vectors_0.42.1            munsell_0.5.1              
 [83] scales_1.3.0                base64url_1.4              
 [85] glue_1.8.0                  tools_4.4.1                
 [87] interp_1.1-6                data.table_1.16.0          
 [89] GenomicAlignments_1.40.0    visNetwork_2.1.2           
 [91] rhdf5_2.48.0                grid_4.4.1                 
 [93] tidyr_1.3.1                 ape_5.8                    
 [95] crosstalk_1.2.1             latticeExtra_0.6-30        
 [97] colorspace_2.1-1            nlme_3.1-166               
 [99] GenomeInfoDbData_1.2.12     cli_3.6.3                  
[101] fansi_1.0.6                 viridisLite_0.4.2          
[103] S4Arrays_1.4.1              gtable_0.3.5               
[105] sass_0.4.9                  digest_0.6.37              
[107] BiocGenerics_0.50.0         SparseArray_1.4.8          
[109] htmlwidgets_1.6.4           htmltools_0.5.8.1          
[111] multtest_2.60.0             lifecycle_1.0.4            
[113] here_1.0.1                  httr_1.4.7                 
[115] secretbase_1.0.3            MASS_7.3-61                

Citation

BibTeX citation:
@online{taudière2024,
  author = {Taudière, Adrien},
  title = {Bioinformatics Pipeline Summary},
  date = {2024-10-19},
  langid = {en}
}
For attribution, please cite this work as:
Taudière, Adrien. 2024. “Bioinformatics Pipeline Summary.” October 19, 2024.